The impact of genetic structure on sequencing analysis
نویسندگان
چکیده
BACKGROUND Genome-wide association studies have made substantial progress in identifying common variants associated with human diseases. Despite such success, a large portion of heritability remains unexplained. Evolutionary theory and empirical studies suggest that rare mutations could play an important role in human diseases, which motivates comprehensive investigation of rare variants in sequencing studies. To explore the association of rare variants with human diseases, many statistical approaches have been developed with different ways of modeling genetic structure (ie, linkage disequilibrium). Nevertheless, the appropriate strategy to model genetic structure of sequencing data and its effect on association analysis have not been well studied. METHODS We investigate 3 statistical approaches that use 3 different strategies to model the genetic structure of sequencing data. We proceed by comparing a burden test that assumes independence among sequencing variants, a burden test that considers pairwise linkage disequilibrium (LD), and a functional analysis of variance (FANOVA) test that models genetic data through fitting continuous curves on individuals' genotypes. RESULTS Through simulations, we find that FANOVA attains better or comparable performance to the 2 burden tests. Overall, the burden test that considers pairwise LD has comparable performance to the burden test that assumes independence between sequencing variants. However, for 1 gene, where the disease-associated variant is located in an LD block, we find that considering pairwise LD could improve the test's performance. CONCLUSIONS The structure of sequencing variants is complex in nature and its patterns vary across the whole genome. In certain cases (eg, a disease-susceptibility variant is in an LD block), ignoring the genetic structure in the association analysis could result in suboptimal performance. Through this study, we show that a functional-based method is promising for modeling the underlying genetic structure of sequencing data, which could lead to better performance.
منابع مشابه
Gene-Gene Interaction Study Between Genetic Polymorphisms of Folate Metabolism and MTR SNPs on Prognostic Features Impact for Breast Cancer
Background: Breast Cancer (BC), the second leading cause of cancer mortality after lung cancer and varied across the world due to genetic and environmental factors. In this study, we evaluated the interaction between the polymorphisms in genes encoding enzymes of folate metabolism: methylenetetrahydrofolate reductase (MTHFR), methionine synthesis reductase (MTR) with the BC prognostic factors. ...
متن کاملImpact of Genetic Variants in Mir-122 Gene and its Flanking Regions on Hepatitis B Risk
MicroRNAs are small non coding RNAs that are involved in gene expression regulation. Mir-122 was reported to inhibit hepatitis B virus (HBV), but little is known about the role of mir-122 polymorphisms on HBV infection development. This present study aimed to investigate the association between single nucleotide polymorphisms (SNPs) in mir-122 gene region with HBV infection. Study cases were HB...
متن کاملMolecular Identification of Rare Clinical Mycobacteria by Application of 16S-23S Spacer Region Sequencing
Objective(s) In addition to several molecular methods and in particular 16S rDNA analysis, the application of a more discriminatory genetic marker, i.e., 16S-23S internal transcribed spacer gene sequence has had a great impact on identification and classification of mycobacteria. In the current study we aimed to apply this sequencing power to conclusive identification of some Iranian clinical ...
متن کاملPopulation genetic studies of Liza aurata using D-Loop sequencing in the southeast and southwest coasts of the Caspian Sea
Genetic diversity as an important marker of the ecological status of aquatic ecosystems is considered a unique and powerful tool to evaluate biological communities. In order to evaluate the genetic diversity among golden mullet species (Liza aurata) in the southeast and southwest coasts of the Caspian Sea by D-Loop gene sequencing, a total of 23 fin specimens of golden mullet were collected fro...
متن کاملPopulation genetic studies of Liza aurata using D-Loop sequencing in the southeast and southwest coasts of the Caspian Sea
Genetic diversity as an important marker of the ecological status of aquatic ecosystems is considered a unique and powerful tool to evaluate biological communities. In order to evaluate the genetic diversity among golden mullet species (Liza aurata) in the southeast and southwest coasts of the Caspian Sea by D-Loop gene sequencing, a total of 23 fin specimens of golden mullet were collected fro...
متن کاملPopulation structure and variation in Persian sturgeon (Acipenser percicus ) from the Caspian Sea as determind from mitochondrial DNA sequences of the control region
Mitochondria1 DNA (mtDNA) control region sequences were analyzed to evaluate the population genetic structure of Persian sturgeon (Acipenser persicus) in Caspian Sea. A total of 45 specimens were collected from the different locations of the Caspian Sea. MtDNA control region was amplified using PCR. Direct sequencing was performed according standard method. The results showed that 12 haplotypes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 10 شماره
صفحات -
تاریخ انتشار 2016